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Genetic changes affecting gene expression contribute to phenotypic divergence; thus, understanding how regulatory 
networks controlling gene expression change over time is critical for understanding evolution. Prior studies of expression 
differences within and between species have identified properties of regulatory divergence, but technical and biological 
differences among these studies make it difficult to assess the generality of these properties or to understand how reg- 
ulatory changes accumulate with divergence time. Here, we address these issues by comparing gene expression among 
strains and species of Drosophila with a range of divergence times and use Fj hybrids to examine inheritance patterns and 
disentangle cis- and fra/?$-reguIatory changes. We find that the fixation of compensatory changes has caused the regulation 
of gene expression to diverge more rapidly than gene expression itself. Specifically, we observed that the proportion of 
genes with evidence of c/5-reguIatory divergence has increased more rapidly with divergence time than the proportion 
of genes with evidence of expression differences. Surprisingly, the amount of expression divergence explained by cis- 
regulatory changes did not increase steadily with divergence time, as was previously proposed. Rather, one species 
[Drosophila sechellia) showed an excess of c/5-reguIatory divergence that we argue most likely resulted from positive selection 
in this lineage. Taken together, this work reveals not only the rate at which gene expression evolves, but also the molecular 
and evolutionary mechanisms responsible for this evolution. 



[Supplemental material is available for this article.] 

Understanding the relationship between tempo (the rate at which 
a trait evolves) and mode (the manner in which a trait evolves) is 
essential for understanding the evolutionary process (Simpson 
1944). This is true not only for organismal phenotypes, but also 
for the molecular phenotypes that produce organismal traits. 
Gene expression is one such molecular phenotype (Gordon and 
Ruvinsky 2012); it is essential for organismal form, fitness, and 
function, and frequently varies within and between species. 
Comparative studies using genomic surveys of gene expression 
in yeast (Busby et al. 2011), Drosophila (Rifkin et al. 2003), and 
mammalian species (Brawand et al. 2011) with a range of diver- 
gence times have provided insight into the tempo of gene ex- 
pression evolution, but the mode and its relationship to tempo 
remain less well understood. 

Elucidating the mode of gene expression evolution includes 
identifying the types of regulatory changes that have evolved as 
well as how interactions among divergent regulatory alleles affect 
gene expression. F x hybrids, in which divergent regulatory alleles 
interact in the same cellular environment, can be used to in- 
vestigate these issues. Allele-specific expression in F x hybrids sep- 
arates the effects of cis- and trans-regulatory changes affecting a 
gene's expression by providing a readout of relative ds-regulatory 
activity in a common trans-regulatory environment (Cowles et al. 
2002). Expression differences between genotypes not attributed to 
ds-regulatory changes are inferred to be caused by trans-regulatory 
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divergence (Wittkopp et al. 2004). In addition, the net effects of 
interactions among divergent regulatory alleles are revealed by 
comparing levels of total expression in Fi hybrids to parental 
genotypes. 

This approach was initially used to separate cis- and trans- 
regulatory effects of divergence affecting expression of dozens of 
genes. These studies suggested that (1) ds-regulatory changes are 
more common than trans-regulatory changes between species 
(Wittkopp et al. 2004); (2) genes with cis- and trans-acting changes 
favoring expression of opposite alleles are more likely than other 
types of changes to cause misexpression in Fi hybrids (Landry et al. 
2005); (3) environmental factors modulate relative ds-regulatory 
activity (de Meaux et al. 2006); (4) ds-regulatory variation is 
abundant in natural populations (Osada et al. 2006; Genissel et al. 
2007; Campbell et al. 2008; Gruber and Long 2009); and (5) the 
amount of expression divergence attributable to ds-acting changes 
is greater between than within species (Wittkopp et al. 2008). 

More recently, microarrays and RNA-seq have been used to 
extend these analyses to the genomic scale (Wang et al. 2008; 
Graze et al. 2009; Tirosh et al. 2009; Zhang and Borevitz 2009; 
Fontanillas et al. 2010; McManus et al. 2010; He et al. 2012; Shi 
et al. 2012; Coolon and Wittkopp 2013; Levy et al. 2013; Schaefke 
et al. 2013). In some cases, relationships seen in the smaller scale 
studies were replicated. For example, cis- and trans-regulatory 
changes with effects in opposite directions were overrepresented 
among misexpressed genes (Tirosh et al. 2009; McManus et al. 
2010; Schaefke et al. 2013) and ds-regulatory changes explained 
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more of the expression differences between than within species 
(Tirosh et al. 2009; Emerson et al. 2010). Other observations, such 
as the relative proportion of genes with evidence of cis- and/or 
trans-regulatory changes, were much more variable among studies. 
Finally, novel patterns, such as the relationship between domi- 
nance and cis /trans-regulatory changes (Lemos et al. 2008; McManus 
et al. 2010) and the frequency of compensatory cis- and trans- 
regulatory variants (Tirosh et al. 2009; Goncalves et al. 2012; Shi 
et al. 2012), were identified. 

Despite this growing collection of case studies examining the 
types of changes responsible for expression differences within 
and/or between particular pairs of species, the use of different or- 
ganisms (flies, yeast, plants, and mice), techniques (pyrosequencing, 
microarrays, RNA-seq), and analysis methods (linear models, exact 
tests, and Bayesian approaches) among these studies precludes the 
type of meta-analysis needed to determine how the mode of regu- 
latory evolution changes with divergence time and to robustly as- 
sess the generality of relationships reported in previous studies. To 
address these issues, we examined the tempo and mode of regula- 
tory evolution in concert using strains and species of Drosophila 
with a range of divergence times. 

Results 

Experimental overview 

mRNA abundance was compared among (1) African and non- 
African strains of Drosophila melanogaster (mel-mel), which have 
been geographically isolated for —10,000 yr and show evidence of 
behavioral isolation (David and Capy 1988; Lachaise et al. 1988; 
Wu et al. 1995; Hollocher et al. 1997) and expression divergence 
(Hutter et al. 2008); (2) D. simulans and D. sechellia (sim-sech), 
which diverged -250,000 yr ago (Garrigan et al. 2012); and (3) 
D. melanogaster and D. simulans (mel-sim), which diverged —2.5 
million yr ago (Fig. 1A; Cutter 2008). For each of these genotypes, 



we derived a strain-specific genome sequence and used RNA-seq to 
measure mRNA abundance (hereafter referred to as expression) in 
a pool of 20 adult female flies. Reciprocal crosses were performed 
for each of the three pairs of genotypes (mel-mel, sim-sech, and 
mel-sim), and RNA-seq was used to measure both total and allele- 
specific expression in pools of 20 female F x hybrids from each cross 
(Fig. IB). Sequence divergence observed in transcribed regions of 
these strains correlated with published estimates of divergence 
time (Fig. 1C) as well as the number of RNA-seq reads informative 
for allele-specific expression (Fig. ID). Gene-specific and allele- 
specific read counts were used to investigate regulatory evolution as 
shown in Supplemental Figure SI. 

Quantifying gene expression levels 

For each comparison (mel-mel, sim-sech, and mel-sim), RNA-seq 
reads from the two strains or species and their F x hybrids were 
aligned to the relevant genomes and mapped to specific genes. 
Differences in sequencing depth among libraries (Supplemental 
Table SI) were eliminated by using random sampling without re- 
placement to produce a data set with the same number of mapped 
reads for each sample. After excluding genes with fewer than 20 
mapped reads in any sample (Supplemental Table S2), 7587 genes 
were deemed suitable for comparing total expression levels be- 
tween all pairs of genotypes and their F x hybrids (Data set 1), which 
is 83% of the genes classified as expressed in D. melanogaster adult 
females by modENCODE (Graveley et al. 2011). 

Measures of relative gene expression derived from these 
mapped and normalized RNA-seq data correlated well with esti- 
mates of relative gene expression derived from independent 
pyrosequencing experiments (Supplemental Fig. S2A; Ahmadian 
et al. 2000). Genome-wide, expression levels between Fi hybrids 
from reciprocal crosses were also highly correlated (Fig. 2A; Sup- 
plemental Fig. S3). Despite this similarity, Fisher's exact tests (FETs) 
with a false discovery rate (FDR) of 0.05 identified significant 
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Figure 1 . Studying regulatory evolution in the melanogaster group of Drosophila. (A) Phylogenetic relationships and estimated divergence times for the 
strains and species analyzed are shown. (B) Sequencing libraries for RNA-seq data were derived from mRNA isolated from each species and strain as well as 
Ft hybrids from reciprocal crosses, in which the maternal and paternal genotypes were reversed (e.g., SI x S2 and S2 x SI). (C) The percent sequence 
divergence observed in the regions of the genome used to map RNA-seq reads (/-axis) is compared with published estimates of divergence time (x-axis). 
(D) The proportion of reads from each gene that is allele-specific is shown for the mel-mel (blue), sim-sech (red), and mel-sim (green) comparisons. 
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Figure 2. Expression divergence between genotypes and in F-| hybrids. (A) Overall expression di- 
vergence (1 - p) is shown for the mel-mel, sim-sech, and mel-sim comparisons in red, with the data used 
for these calculations shown in Supplemental Figure S5. Average differences in expression between Ft 
hybrids and each of the parental species are shown in blue, with the data used for these calculations 
shown in Supplemental Figure S7. Expression divergence between reciprocal Ft hybrids is included as 
a baseline in green, with the data used for these calculations shown in Supplemental Figure S3. In this 
and all other figures, results from each comparison are plotted using the genomic sequence divergence 
observed between the genotypes involved (Fig. 1C). (B) The proportion of genes showing evidence of 
significant expression differences between genotypes (red), the average proportion of genes showing 
significant expression differences between Fq hybrids and each parental species (blue), and the pro- 
portion of genes with significant expression differences between reciprocal F-i hybrid genotypes (green) 
are shown. (C) The line plots show expression differences for individual genes in the mel-mel, sim-sech, 
and mel-sim comparisons plotted according to divergence time, with the 7587 genes included in Data 
set 1 classified into nine groups depending on whether they showed increased, decreased, or similar 
expression differences between mel-mel and sim-sech and between sim-sech and mel-sim. The red line 
in each plot shows the median expression difference for genes in that class for each comparison. The pie 
chart shows the relative frequency of genes in each class. (D) The proportion of genes showing ex- 
pression levels in Ft hybrids consistent with additive inheritance (red), dominant inheritance (green), 
misexpression (blue), or similar expression (purple) is shown for each comparison. Data used to cal- 
culate these proportions are shown in Supplemental Figure S8. Error bars in panel A show the 95% 
quantiles from 1 0,000 bootstrap replicates in which differences in 1 - p between mel-mel and sim-sech 
as well as between sim-sech and mel-sim were calculated for each bootstrap replicate. The significance 
of the observed deviation from zero was determined by comparing the observed value to the distri- 
bution of bootstrap values. In panels B and D, significance was determined using Fisher's exact tests. 
Significance of each comparison: (*) P < 0.05, (**) P < 0.001, (***) P < 1 X 1 0 4 . 



expression differences between reciprocal hybrids for 26%-49% 
of individual genes (Fig. 2B). Most of these significant expression 
differences were small in magnitude (median expression difference = 
1.20- to 1.25-fold) (Supplemental Fig. S4), however, they reflect 
the sensitivity of the Fisher's exact test for detecting differences in 
relative expression from RNA-seq data when read counts are high. 
These differences in expression between hybrids from reciprocal 
crosses provide a conservative baseline for expression differences 



detected in the mel-mel, sim-sech, and mel- 
sim comparisons because they include 
variance from technical and biological 
replication as well as parent-of-origin 
effects. 



Evolution of expression differences 

To determine the tempo of regulatory 
divergence, we compared total expres- 
sion levels in the mel-mel, sim-sech, and 
mel-sim comparisons for the set of 7587 
genes in Data set 1 described above. First, 
we analyzed overall expression diver- 
gence (1 - Spearman's p, see Methods) 
and found that it increased consistently 
and significantly with divergence time 
(Fig. 2A; Supplemental Fig. S5). We then 
used FETs to compare expression levels 
for individual genes and determine 
whether the increased overall expression 
divergence resulted from more genes with 
divergent expression or more divergent 
expression of similar numbers of genes. 
Surprisingly, we found that the propor- 
tion of genes with significant expression 
differences did not increase consistently 
with divergence time (Fig. 2B), sug- 
gesting that increasing magnitudes of 
expression differences rather than in- 
creasing numbers of genes with diver- 
gent expression drive the overall increase 
in expression differences with divergence 
time observed. 

We also examined the evolutionary 
trajectories of individual genes by as- 
signing each of the 7587 genes in Data 
set 1 to one of nine classes depending on 
whether its expression difference increased, 
decreased, or remained similar between 
mel-mel and sim-sech and between sim- 
sech and mel-sim. Expression differences 
less than 1.25-fold were considered simi- 
lar for this analysis to minimize the im- 
pact of small but statistically significant 
expression differences (Supplemental 
Fig. S6). Despite observing that expres- 
sion differences increased with diver- 
gence time on a genomic scale (Fig. 2A), 
this pattern was only seen for 2% of in- 
dividual genes (Fig. 2C, class I). Expres- 
sion differences of similar magnitude in 
all three comparisons were much more 
common (43% of all genes examined) and 
tended to be small in magnitude (median expression difference = 
1.18-fold) (Fig. 2C, class II). The remaining 55% of genes fell into one 
of seven categories in which two of the three comparisons showed 
similar expression differences (Fig. 2C, class III). Interestingly, nearly 
half (45%) of such genes showed similar expression differences in 
mel-mel and mel-sim but larger or smaller expression differences in 
the sim-sech comparison (Fig. 2C, IIIc and Hid), which has an in- 
termediate divergence time. 



Genome Research 799 



www.genome.org 



Coolon et al. 



Evolution of regulatory incompatibilities 

Divergence of the regulatory networks controlling gene expression 
can cause misexpression in F x hybrids that can contribute to spe- 
ciation (Meiklejohn et al. 2003; Michalak and Noor 2004; Ranz 
et al. 2004; Haerty and Singh 2006; Moehring et al. 2007; 
Maheshwari and Barbash 2012). This can occur, for example, when 
proteins and/or DNA with sequence-specific interactions coevolve 
such that divergent alleles of the interacting molecules do not 
function properly together in F x hybrids. To determine the rate at 
which misexpression resulting from such regulatory incom- 
patibilities evolves, we compared expression levels in mel-mel, 
sim-sech, and mel-sim F x hybrids to expression levels in the cor- 
responding parental genotypes. We found that overall expression 
differences between parents and F x hybrids increased with di- 
vergence time, most dramatically in the mel-sim comparisons (Fig. 
2A; Supplemental Fig. S7). A similar increase was seen in the pro- 
portion of genes showing misexpression in F x hybrids (Fig. 2B). 
The much more extensive misexpression seen in mel-sim F x hy- 
brids compared with mel-mel or sim-sech F x hybrids is consistent 
with mel-sim F x hybrid females having morphological defects that 
cause sterility (Dickinson et al. 1984) and mel-mel and sim-sech 
F x hybrid females being completely fertile (Lachaise et al. 1986; 
Hollocher et al. 1997). 

To further investigate the inheritance of gene expression 
levels and how inheritance patterns change over evolutionary 
time, we considered each gene separately and classified its ex- 
pression in Fi hybrids as dominant, additive, misexpressed (i.e., 
over- or under-dominant), or similar (Supplemental Fig. S8). To 
minimize the impact of small but statistically significant expres- 
sion differences on this analysis (Supplemental Fig. S9), we con- 
sidered expression similar between genotypes if the expression 
difference was less than 1.25-fold. In the mel-mel F x hybrids, we 
found that 7% of genes showed additivity, 14% showed mis- 
expression, and 43% showed dominant inheritance. The remain- 
ing 36% of genes showed similar expression in both strains of D. 
melanogaster and in their F x hybrids. The proportions of genes with 
additive and dominant inheritance decreased consistently with 
divergence time, whereas the proportion of genes showing mis- 
expression increased dramatically with divergence time (Fig. 2D). 



Using allele-specific RNA-seq reads to study regulatory 
evolution 

Differences in gene expression can be caused by changes in ris- 
and/or trans-regulation. Understanding the relative contribution 
of these two types of changes is critical for understanding the mode 
of regulatory evolution (Gordon and Ruvinsky 2012). To separate 
the effects of ris- and trans-regulatory divergence, we analyzed al- 
lele-specific expression in F x hybrids and contrasted it with com- 
parable measures of total expression differences between parental 
genotypes derived from allele-specific reads in "mixed parental" 
samples. These mixed parental samples were constructed in silico 
by combining equal numbers of mapped RNA-seq reads from each 
parental genotype and subjected to the same bioinformatic anal- 
ysis as the reads from F x hybrids. Expression differences between 
alleles in F x hybrids were attributed to ris-regulatory differences, 
and differences in relative expression between parental genotypes 
that were not explained by differences in ris-regulatory activity were 
attributed to trans-regulatory divergence (Wittkopp et al. 2004). 

For each F x hybrid and mixed parent sample, RNA-seq reads 
that aligned perfectly and uniquely to one parental genome but 



not the other were considered allele-specific. Genes with low 
confidence allele assignments (see Supplemental Material), fewer 
than 20 total allele-specific reads, or expression consistent with 
genomic imprinting in any comparison were excluded from 
analysis (Supplemental Table S3). For each of the remaining 4851 
genes, differences in the number of allele-specific reads among 
comparisons were eliminated by using hypergeometric sampling 
to produce a data set with the same number of allele-specific 
reads in all comparisons (Data set 2). Measures of relative total 
expression derived from allele-specific reads in the mixed parental 
samples were strongly correlated with measures of relative total 
expression derived from the full RNA-seq data set (Supplemental 
Fig. S10) and pyrosequencing (Supplemental Fig. S2B). Relative 
allele-specific expression in F x hybrids also showed a strong cor- 
relation between the RNA-seq and pyrosequencing data (Supple- 
mental Fig. S2C) and was similar in F x hybrids from reciprocal 
crosses (Supplemental Fig. Sll). In the analyses described below, 
hybrids from reciprocal crosses were considered separately, with 
results from one hybrid for each comparison presented in the main 
text and results from the other hybrid presented in the Supple- 
mental Material. With few exceptions (noted below), results were 
similar between reciprocal hybrids. 

Evolution of cis- and fra/w-regulation 

To determine the rate of ris-regulatory divergence and compare it 
with the rate of total expression divergence for the same genes, we 
contrasted overall differences in relative allelic abundance be- 
tween the F x hybrid and mixed parental samples for the 485 1 genes 
deemed suitable for measuring allele-specific expression (Data 
set 2). Compared with the 7587 genes discussed above (Data set 1), 
this set of genes showed more similar levels of overall expression 
differences among the three comparisons (Fig. 3A; Supplemental 
Figs. S12A, S13A-C), resulting from increased expression di- 
vergence in mel-mel and sim-sech in Data set 2 relative to Data set 
1 (Supplemental Fig. SI 4). Despite this similarity in total expres- 
sion differences among comparisons, we found that ris-regulatory 
differences were greater between than within species, with similar 
differences in relative ris-regulatory activity observed in sim-sech 
and mel-sim (Fig. 3 A; Supplemental Figs. S12A, S13D-I). Com- 
paring the proportions of genes with statistically significant dif- 
ferences in total expression and ris-regulatory activity showed a 
similar pattern, except that the proportion of genes with evidence 
of a ris-regulatory difference increased consistently and signifi- 
cantly with divergence time (Fig. 3B; Supplemental Fig. S12B). 
This suggests that the greater overall ris-regulatory divergence 
observed in the sim-sech comparison for these 4851 genes results 
from large differences in relative ris-regulatory activity for some 
genes rather than an excess of genes with divergent ris-regulatory 
activity. 

We also compared the evolutionary trajectories of individual 
genes for total expression differences (Fig. 3C) and relative ris- 
regulatory activity (Fig. 3D; Supplemental Fig. SI 5) by dividing the 
4851 genes in Data set 2 into the same nine classes described above 
for Data set 1 (Fig. 2C). Compared with total expression, we found 
that more genes showed consistent and small (median = 1.16-fold) 
differences in relative ris-regulatory activity in all three compari- 
sons (Fig. 3C, II, 3D, II). We also observed more genes with unique 
differences in ris-regulatory activity in sim-sech (Fig. 3D, IIIc,d) 
and mel-sim (Fig. 3D, IIIe,f) that were greater in these comparisons 
than the other two comparisons. In other words, genes with a 
similar difference in ris-regulatory activity in mel-mel and mel-sim 
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Figure 3. Evolution of cis- and trans- regulation. (A) Overall differences (1 - p) in total expression be- 
tween genotypes (blue) and allele-specific expression in Ft hybrids (red) are shown for each comparison, 
with data used for these calculations shown in Supplemental Figure SI 3. Relative allelic expression in Ft 
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10 4 . Comparable analyses for reciprocal hybrids are shown in Supplemental Figures SI 2, SI 5. 



but not sim-sech were more likely to 
show increased than decreased diver- 
gence in sim-sech relative to the other 
two comparisons. Such asymmetry was 
much less pronounced for levels of total 
expression (Fig. 3C), suggesting that trans- 
acting changes have compensated for dif- 
ferences in c/s-regulatory activity in 
many cases. 

Differences between divergent c/s- 
regulatory activity and total gene ex- 
pression are caused by the divergence of 
trans-regulatory factors. We found that 
significantly more genes showed evidence 
of trans-regulatory differences in the mel- 
mel and mel-sim comparisons than in the 
sim-sech comparison (Fig. 3B; Supple- 
mental Fig. S12B). This suggests that c/s- 
regulatory divergence accounts for a 
larger proportion of overall expression 
divergence in sim-sech than in mel-mel 
or mel-sim. Consistent with this infer- 
ence, a regression analysis showed that 
c/s-regulatory differences explained more 
of the expression differences between 
D. simulans and D. sechellia than between 
either of the other two pairs of genotypes 
(Supplemental Fig. SI 6). 

As overall sequence divergence in- 
creases, the number of loci with variation 
affecting expression of each gene is also 
expected to increase. Consistent with this 
expectation, we found that the propor- 
tion of genes with regulatory changes 
showing evidence of both cis- and trans- 
regulatory changes increased with diver- 
gence time, although the increase be- 
tween the mel-mel and sim-sech com- 
parisons was only statistically significant 
for one of the two hybrids (Fig. 3E; Sup- 
plemental Fig. S12C). For the majority of 
these genes, the cis- and trans-regulatory 
changes favored expression of alternative 
alleles (Fig. 3F; Supplemental Fig. SI 2D), 
suggesting that stabilizing selection has 
favored regulatory mutations that reduce 
expression differences. As described above, 
this type of developmental systems drift 
(True and Haag 2001) is thought to cause 
misexpression in Fi hybrids (Michalak 
and Noor 2004; Ranz et al. 2004; Landry 
et al. 2005; McManus et al. 2010; Barriere 
et al. 2012; Maheshwari and Barbash 
2012). The frequency of genes with com- 
pensatory cis- and trans-regulatory changes 
did not increase steadily with divergence 
time; however, cis- and trans-regulatory 
changes favoring expression of opposite 
alleles were observed least often in the sim- 
sech comparison (Fig. 3F; Supplemental 
Fig. SI 2D). Contrary to prior studies 
(Landry et al. 2005; Tirosh et al. 2009; 
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McManus et al. 2010), we found that genes affected by cis- and 
trans-regulatory changes with opposing effects on total expression 
levels were not more likely to show misexpression in F x hybrids 
(Fig. 3G; Supplemental Fig. S12E). 

To determine how the relative effects of cis- and trans-regu- 
latory changes vary with divergence time, we calculated the per- 
centage of total regulatory divergence attributable to ds-regulatory 
changes for each gene. This value is referred to as "percent cis" 
(% cis), and prior studies of flies (Wittkopp et al. 2008; McManus et al. 
2010) and yeast (Emerson et al. 2010) found it to be larger between 
than within species. We also found that % cis was larger between 
than within species; however, in contrast to prior predictions 
(Wittkopp et al. 2008), % cis did not increase systematically with 
divergence time. Rather, it was largest for the sim-sech comparison 
with intermediate divergence time (Fig. 4 A; Supplemental Fig. S17A). 
A correlation between % cis and total expression divergence for 
individual genes was previously reported between D. melanogaster 
and D. sechellia (McManus et al. 2010), but we did not observe 
this pattern for any of the three comparisons (Supplemental Fig. 
S17B-G). Finally, two prior studies (Lemos et al. 2008; McManus 
et al. 2010) reported that % cis was higher for genes showing 
additive than nonadditive (i.e., dominant, over-dominant, or 
under-dominant) inheritance. We observed this relationship only 
for the comparison of D. simulans andD. sechellia in one hybrid (Fig. 
4B; Supplemental Fig. S17H), suggesting that it is also not a general 
feature of regulatory evolution. 

Discussion 

Researchers have been comparing genomic patterns of expression 
divergence among species for over a decade using microarrays, but 
sequence divergence between microarray probes and RNA samples 
often complicates comparisons among species and differences in 
normalization and statistical analyses can complicate comparisons 
among studies. Here, we use RNA-seq data to determine the tempo 
and mode of regulatory evolution among four divergent strains 
and species of Drosophila. This technique is better suited for in- 
terspecific comparisons than microarrays because it uses full se- 
quence information instead of hybridization signals to determine 
gene expression levels, allowing more direct comparisons among 
species and studies. 

RNA-seq was also recently used to compare expression levels 
in six different tissues among nine mammalian species and a bird 



(Brawand et al. 2011). Using Spearman's rank correlation co- 
efficient p to compare overall expression differences in each pair of 
species, this study showed that expression similarity decreased 
quickly over shorter divergence times and then slowed. Patterns of 
expression divergence were strikingly similar among RNA samples 
from brain (cerebral cortex or whole brain without cerebellum), 
cerebellum, heart, kidney, and liver, with accelerated expression 
divergence in RNA samples from testes (Brawand et al. 2011). 
By combining our data with data from three previous studies 
(McManus et al. 2010; Meisel et al. 2012; Suvorov et al. 2013), we 
found that expression divergence among Drosophila species 
showed a similar pattern to that of mammals, but on a different 
timescale (Fig. 5A,B). The Drosophila data showed greater expres- 
sion divergence (lower values of p) than the mammalian data, 
which could be due to differences in tissue size among Drosophila 
species given that whole bodies rather than single tissues were used 
to generate these data. RNA-seq has also been used to compare 
expression divergence among four species of yeast (Busby et al. 
2011), but it is difficult to compare the tempo in yeast to that of 
Drosophila and mammals because only three divergence time points 
were sampled (Fig. 5C). 

For each gene, interspecific expression differences can be 
caused by cis- and/ or trans-regulatory changes. When F x hybrids 
can be made between species, measures of allele-specific expres- 
sion can be used to disentangle the net effects of these two types of 
changes (Wittkopp et al. 2004). Such analyses have been reported 
for closely related pairs of strains or species in yeast (Tirosh et al. 
2009; Emerson et al. 2010; Schaefke et al. 2013), flies (Graze et al. 
2009; McManus et al. 2010; Coolon et al. 2012), plants (Shi et al. 
2012; Bell et al. 2013), fishes (Murata et al. 2012; Shen et al. 2012), 
and mice (Goncalves et al. 2012). To the best of our knowledge, this 
is the first genomic study collecting data on cis- and trans-regula- 
tory divergence for more than one pair of genotypes. As such, it 
provided unprecedented insight into the rate at which cis- and 
trans-regulatory changes evolve and allowed us to better assess the 
generality of relationships reported in other studies. 

Compensatory cis- and fra/w-regulatory changes are common 

We found that the number of genes with evidence for ds-regula- 
tory divergence increased linearly with divergence time, but the 
number of genes with differences in total expression did not (Fig. 3B; 
Supplemental Fig. SI 2D). This suggests that trans-regulatory factors 
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Figure 4. Effects of cis- regulatory divergence. (A) The percentage of total regulatory divergence attributable to cis- regulatory divergence (% cis) is 
shown for the mel-mel, sim-sech, and mel-sim comparisons. (B) % cis is compared for sets of genes showing additive ("A") and nonadditive ("NA" 
[dominant or misexpression]) inheritance for each comparison. In all panels, notched box plots show the full range of values as well as the 25th, 50th, and 
75th percentiles. Within both panels, the widths of the boxes are proportional to the number of genes represented. Statistical significance of differences 
between median values connected with solid lines was determined using Mann-Whitney IZ-tests. (*) P < 0.05, (**) P < 0.001, (***) P < 1 x 10 -4 . 
Comparable analyses for reciprocal hybrids are shown in Supplemental Figure SI 7. 



802 Genome Research 



www.genome.org 



Tempo and mode of regulatory evolution 



A. 



Mammals 




C. 



0.9 -j 

0.8 
cl0.7 
■2 0.6 
E 0.5 

| °- 4 
00 0.3 

0.2 -j 

0.1 



• Brawand etal. 2011 



100 150 200 
Divergence time (my) 

Drosophila 



250 



300 




O Suvorov etal. 2013 
O This study 
O McManus et al. 2010 
• Meisel etal. 2012 



10 15 20 

Divergence time (my) 

Yeast 



25 



30 



• Busby etal. 2011 



10 

Divergence time (my) 



15 



20 



Figure 5. Expression divergence in mammals, Drosophila, and yeast. (A) 
Expression similarity (Spearman's p) was calculated using RNA-seq data 
from kidneys published in Brawand et al. (2011) comparing human 
samples with those of eight other mammalian species and one bird. We 
chose to analyze the data from kidneys because they were the most rep- 
resentative of all the tissues examined (excluding testes). Divergence times 
in millions of years are as reported in Brawand et al. (201 1). (B) Expression 
similarity (Spearman's p) was calculated for data described in this paper 
(light gray circles) as well as data published in Suvorov et al. (201 3) (open 
circles), McManus et al. (201 0) (gray circles), and Meisel et al. (201 2) (black 
circles). Divergence times for mel-mel, sim-sech, and mel-sim are as de- 
scribed in Figure 1A. For all other comparisons, estimated divergence times 
from Obbard et al. (201 2) were used. (C) Expression similarity (Spearman's p) 
was calculated using the data reported in Busby et al. (201 1 ) for all pairwise 
comparisons of four yeast species. Divergence times for these species are 
from Kellis et al. (2003). In all three cases, the black line connects the av- 
erage value of p for each divergence time sampled. 



might often compensate for ds-regulatory differences at the level 
of total gene expression, either by fixing compensatory trans-reg- 
ulatory variants or by feedback mechanisms affecting availability 
or activity of trans-acting factors (McManus et al. 2014). Consistent 
with this interpretation, cis- and trans-acting changes affecting 
expression of the same gene had opposite effects on expression 
levels 79%, 73%, and 87% of the time in the mel-mel, sim-sech, 
and mel-sim comparisons, respectively (Fig. 3F). The exponential 
accumulation of genes that are misexpressed in Fi hybrids (Fig. 2D) 
is also consistent with compensatory changes playing an impor- 
tant role in maintaining gene expression levels over evolutionary 
time (Landry et al. 2005). Such compensation can result from 
stabilizing selection acting to maintain similar expression levels in 
the face of new mutations, and has been seen not only in flies, but 
also in yeast (Tirosh et al. 2009), mice (Goncalves et al. 2012), and 
plants (Shi et al. 2012). 

Compensation for ds-regulatory divergence resulting from 
the fixation of trans-acting changes could evolve by fixing cis- 
acting mutations first and then compensating trans-acting muta- 
tions, or vice versa. We favor the latter model because trans-acting 
mutations appear to arise more frequently than ds-acting mu- 
tations for individual genes (Gruber et al. 2012) and most trans- 
acting mutations that compensate for ds-regulatory divergence of 
one gene are expected to have deleterious pleiotropic effects on 
expression of other genes (Wray et al. 2003; Carroll 2008; Stern and 
Orgogozo 2008). Goncalves et al. (2012) favored a similar expla- 
nation for the extensive compensatory cis- and trans-regulatory 
changes they observed between strains of mice. An example of 
such trans-regulatory divergence subsequently compensated for by ds- 
regulatory changes has been described in yeast (Kuo et al. 2010). Re- 
gardless of which type of regulatory mutation is usually fixed first, it is 
clear that the regulatory networks controlling gene expression 
evolve more rapidly than the output from these networks. 

Relative impact of selection and drift on regulatory evolution 

A common goal for comparative studies of gene expression is 
identifying the selective and nonselective forces responsible for 
patterns of divergence and conservation, but this is not straight- 
forward (Gilad et al. 2006b; Fay and Wittkopp 2008; Emerson et al. 
2010). Without the biological replication needed to make statisti- 
cally robust inferences based on alternative evolutionary models 
(e.g., Rifkin et al. 2003; Fay and Wittkopp 2008; Bedford and Hartl 
2009; Brawand et al. 2011), we can only make speculative state- 
ments about the evolutionary processes responsible for each of the 
nine different trajectories of expression divergence we observed 
(Figs. 2C, 3C,D). For example, genes with similar (and typically 
small) expression differences in all three comparisons (class II in 
Figs. 2C and 3C,D) may either have low mutation-drift variance 
or be subject primarily to stabilizing selection. This is the most 
abundant class of genes for both total expression and ds-regulatory 
activity with 43% and 34% of genes showing this pattern for total 
expression in Data sets 1 and 2, respectively, and 48% of genes 
showing this pattern for differences in ds-regulatory activity in 
Data set 2. This is consistent with prior work suggesting that sta- 
bilizing selection has had a larger impact on the evolution of gene 
expression than genetic drift (Hsieh et al. 2003; Rifkin et al. 2003; 
Lemos et al. 2005; Gilad et al. 2006a; Xing et al. 2007; Kalinka et al. 
2010). Indeed, <2.2% of genes in each comparison showed the 
increasing differences in total expression and/or ds-regulatory 
activity with divergence time (class I in Figs. 2C, 3C,D, and 
Supplemental Fig. SI 5) that are expected when expression evolves 
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primarily due to genetic drift (Khaitovich et al. 2004; Gilad et al. 
2006a). The remaining genes fell into one of seven categories 
consistent with variable selection pressures among lineages (class 
III in Figs. 2C and 3C,D). 

Lineage-specific regulatory changes in D. sechellia 

Gene-specific patterns of total expression divergence consistent 
with lineage-specific selection were more abundant in sim-sech 
than mel-mel or mel-sim for both Data sets 1 and 2 despite the 
sim-sech comparison having an intermediate divergence time 
(Figs. 2C, 3C). This is consistent with D. sechellia being an island 
endemic species with a small effective population size that has 
evolved many novel phenotypes relative to D. melanogaster and 
D. simulans (Orgogozo and Stern 2009), including adaptation to 
a new host plant (Jones 2005). As a consequence of this evolu- 
tionary history, D. sechellia might have fixed more deleterious 
mutations than the other two species by drift as well as more 
adaptive substitutions by positive selection. We observed an ap- 
parent excess of ds-regulatory divergence between D. simulans and 
D. sechellia (Figs. 3 A, 4 A; Supplemental Fig. SI 6) that we believe is 
more likely to result from positive selection than drift because 
(1) trans-acting variation contributes more than ris-acting varia- 
tion to polymorphic expression within species (Lemos et al. 2008; 
Wittkopp et al. 2008; Emerson et al. 2010), suggesting that drift is 
more likely to fix trans-acting than ris-acting variants; (2) cis- and 
trans-regulatory changes affecting expression of the same gene 
were most likely to act in the same direction in the sim-sech 
comparison (Fig. 3F), which is consistent with positive, directional 
selection; and (3) simulation studies have shown that ris-regula- 
tory divergence is more likely to be driven by natural selection than 
trans-regulatory divergence (Emerson et al. 2010). These results 
emphasize the importance of considering not only divergence time, 
but also the demographic and ecological history of individual spe- 
cies when studying the tempo and mode of evolution. 

Methods 

Fly strains, rearing, and collections 

Four Drosophila genotypes were used for this study: the D. melanogaster 
North American zhr strain [full genotype: XYS.YL.Df (l)Zhr] (Sawamura 
and Yamamoto 1993; Fenee and Barbash 2009), the D. melanogaster 
Zimbabwean isofemale strain z30 (Begun and Aquadro 1993; Wu 
et al. 1995), the sequenced D. sechellia strain (droSecl [14021- 
0428.25]), and an isofemale strain of D. simulans (Tsimbazaza) 
that mates well with D. melanogaster (Hollocher et al. 2000). All flies 
were reared on cornmeal medium using a 16:8 lightdark cycle at 
20°C. Just prior to the start of the experiment, all strains were sub- 
jected to 10 generations of sibling pair matings to reduce genome- 
wide heterozygosity, followed by three generations of population 
expansion to generate the quantity of flies needed for crosses. For 
each cross between strains of D. melanogaster, 10 vials were set up 
with three female and three male flies each. For each interspecific 
cross, 30 vials were set up with three female and three male flies 
each. Virgin female progeny were allowed to mate from the time of 
eclosion to 3 d post-eclosion, then males and females were separated 
and females aged to 7-10 d post-eclosion. All flies were collected 
between 9 and 10 am to minimize the effects of circadian rhythm 
and snap-frozen in liquid nitrogen. 

Sample preparation and sequencing 

For each genotype analyzed, a pool of 20 female flies was used 
for total RNA extraction with TRIzol reagent according to 



manufacturer instructions (Invitrogen). This incorporates varia- 
tion from biological replication into a single sample. Prior work has 
shown that expression for most genes is similar among replicate 
pools constructed in this way (Wittkopp et al. 2004, 2008; Coolon 
et al. 2012). Genomic DNA (gDNA) was extracted from a separate 
pool of 20 flies for each genotype using the DNeasy Blood & Tissue 
Kit (Qiagen). Illumina sequencing libraries for RNA-seq were pre- 
pared as previously reported (McManus et al. 2010; Coolon et al. 
2012). Briefly, 10 \xg of total RNA from each sample was treated 
with DNase I (Invitrogen) followed by poly(A) + selection using 
Dynal magnetic beads (Invitrogen). Poly(A) + RNA was fragmented 
using RNA fragmentation reagent (Ambion) before cDNA synthe- 
sis. Double-stranded cDNA was produced using random hexamers 
and Superscript II reverse transcriptase (Invitrogen). cDNA was run 
on a 2% agarose gel and the region corresponding to ~300-bp 
fragments was extracted. The size-selected double-stranded cDNA 
extracted from this gel slice was used in the Paired-End Genomic 
DNA Library Preparation Kit (Illumina) according to manufac- 
turer's recommendations. For the gDNA sequencing libraries, 10 ng 
of gDNA was used with the Paired-End Genomic DNA Library 
Preparation Kit (Illumina), following manufacturer's recommen- 
dations. Each cDNA and gDNA library was subjected to a full lane 
of paired-end sequencing on an Illumina Genome Analyzer IIx 
using 76 cycles. On average, 24 million 76-bp, paired-end sequence 
reads were generated from each sequencing library (Supplemental 
Table SI). The zhr gDNA sample was also sequenced from a single 
end on an additional lane for 76 cycles per read. Images were an- 
alyzed using the Firecrest and Bustard modules to generate se- 
quence and quality scores for each read. 

Resequencing, genome assembly, and sequence divergence 

Using the gDNA sequences, we constructed a strain-specific ge- 
nome sequence for each genotype as described in the Supple- 
mental Material. To determine percent sequence divergence in 
each comparison (mel-mel, sim-sec, mel-sim), we created reverse 
chain files to liftOver coordinates from D. melanogaster dm3 space 
to each of the other strain or species genomic space (zhr, z30, 
Tsimbazaza, droSecl) using the chainSwap utility from the UCSC 
Genome Browser (Kent et al. 2002). Using these chain files, we 
converted the dm3 genomic coordinates for each exon used for 
quantification in this study into their respective strain- or species- 
specific genomic coordinates. Using these coordinates, sequences 
for each exon were extracted from each strain- or species-specific 
genome. These sequences were aligned in pairs using Fast Statis- 
tical Alignment (FSA) (Bradley et al. 2009), and the number of di- 
vergent sites per gene was determined using custom perl scripts 
(pairwise_aln_FSA.pl, compare_pairwise.pl, seq_div_from_set.pl) . 
Strain-specific genomes and chain files are provided in Supple- 
mental File 1, and all custom perl scripts are included in Supple- 
mental File 2. 

Mapping sequencing reads to genes and alleles 

We built a bioinformatics pipeline to measure total and allele- 
specific expression from Illumina sequencing outputs similar to 
those reported previously (McManus et al. 2010; Coolon et al. 
2012). This pipeline, as well as the pyrosequencing methods used 
to validate measures of total and allele-specific expression derived 
from this pipeline, is described in the Supplemental Material. 

Normalizing RNA-seq read counts among comparisons 

Different numbers of sequence reads were recovered for each of the 
10 cDNA libraries sequenced. These differences in read counts 
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caused the Fisher's exact tests used to identify significant changes 
in gene expression between pairs of genotypes to have differences 
in power among the mel-mel, sim-sech, and mel-sim comparisons. 
To equalize power in all three comparisons, we considered exactly 
12,704,991 mapped reads from each RNA-seq data set by down- 
sampling mapped reads randomly without replacement in all but 
the D. sechellia data set, which already had exactly 12,704,991 
mapped reads (Supplemental Table S2). A similar down-sampling 
strategy was recently used to investigate the power of different bio- 
informatic tools for identifying expression differences (Rapaport 
et al. 2013). We then excluded genes with fewer than 20 reads 
in any of the RNA-seq data sets, resulting in the same 7587 
"expressed" genes being analyzed in each comparison (Supple- 
mental Table S2). Simulations confirmed that a larger data set 
down-sampled in this way has the same power to detect significant 
expression differences with a Fisher's exact test as a data set origi- 
nally collected at the smaller sample size (data not shown). The 
exact data analyzed are provided in Supplemental Material as 
Data set 1. 

Comparing total expression among genotypes 

Spearman's correlation coefficients (p) were used to measure 
overall expression differences between pairs of genotypes, fol- 
lowing Brawand et al. (2011) and Meisel et al. (2012). Unlike 
Pearson's r, Spearman's p makes no assumptions about normality, 
linearity, or homoscedasticity. It is also less sensitive to outliers. 
Bootstrapping was used to test for statistically significant differ- 
ences in p between mel-mel and sim-sech and between sim-sech 
and mel-sim by sampling with replacement 7587 gene-specific 
read counts from the observed 7587 genes 10,000 times using R, 
calculating p in each case, and determining the 2.5% and 97.5% 
percentiles. Significant differences were inferred when these 95% 
quantiles did not overlap. 

We also tested for significant differences in expression level of 
individual genes by comparing the number of reads mapping to 
the focal gene to the number of reads mapping to the other 7586 
genes between parental types, between reciprocal hybrids, and 
between each hybrid and parent using Fisher's exact tests with 
a null hypothesis of equal expression in both samples. This test was 
used instead of other methods for detecting differential expression 
because it recovers a similar proportion of true positives with fewer 
false positives without requiring replicates (Tarazona et al. 2011). 
Fisher's exact tests were also used to test for significant differences 
in the proportion of genes with significant differences between 
mel-mel and sim-sech and between sim-sech and mel-sim. 

Inferring the mode of inheritance 

To determine the mode of inheritance for each gene in each 
comparison, we followed the logic outlined in Gibson et al. (2004) 
and used previously for RNA-seq data in McManus et al. (2010). 
Using a 1.25-fold expression difference cutoff and total expression 
levels in the F x hybrids and corresponding parental genotypes, 
we classified each gene as either "similar," "additive," "parent 1 
dominant," "parent 2 dominant," "under-dominant," or "over- 
dominant." Dominant inheritance was inferred when total ex- 
pression in the F x hybrid was similar to expression in one of the 
parental genotypes but different from the other parental genotype. 
Such genes were classified as either "parent 1 dominant" or "parent 2 
dominant" depending on which parent the F x hybrid resembled. 
Additive inheritance was inferred when F x hybrid expression 
was different from, and intermediate to, both parents; and mis- 
expression was inferred when the total expression in the F x hybrid 
was different from both parental genotypes and greater than (over- 



dominant) or less than (under-dominant) the more extreme pa- 
rental expression level. Genes with similar expression in both 
parents and F x hybrids were classified as similar. Fisher's exact tests 
were used to test for significant differences in the proportion of 
genes in each category between mel-mel and sim-sech and be- 
tween sim-sech and mel-sim. 

Normalizing allele-specific RNA-seq read counts 
among comparisons 

To equalize power when testing for ds-regulatory divergence in 
mel-mel, sim-sech, and mel-sim, as well as when comparing tests 
for ris-regulatory and total expression divergence, we created 
a second data set with the same number of allele-specific reads for 
each gene in all comparisons. This data set was constructed by (1) 
combining the equal numbers of mapped reads for each genotype 
used in the first data set to make a "mixed parental" sample for 
each comparison (e.g., reads from zhr and z30 were combined for 
the mel-mel comparison); (2) counting allele-specific reads (i.e., 
reads that mapped perfectly and uniquely to only one of the pa- 
rental genomes) in all mixed parental and F x hybrid samples; and 
(3) equalizing allele-specific read counts for each gene in all mixed 
parental and hybrid samples by identifying the sample with the 
fewest allele-specific reads for that gene and using hypergeometric 
sampling of the observed allele-specific read counts to randomly 
reduce the number of allele-specific reads considered in each of the 
other samples. Simulations confirmed that this down-sampling 
approach produced data sets with the same power to detect sig- 
nificant expression differences with Fisher's exact tests as data sets 
originally collected at the smaller sample sizes (data not shown), 
and a similar method was recently used for allele-specific RNA-seq 
data from humans (Lappalainen et al. 2013). 

Prior to analysis, genes with low confidence allele-assign- 
ments in the mel-mel, sim-sech, or mel-sim comparisons, defined 
as having >10% of the mapped reads from one parent aligned 
solely to the genome of the other parent, were excluded. Genes 
with less than 20 total allele-specific reads (allele 1 + allele 2 < 20) in 
any mixed parental or hybrid sample were also excluded from all 
comparisons; this threshold was based on prior theoretical and 
empirical work (Fontanillas et al. 2010; McManus et al. 2010). Fi- 
nally, nine more genes were excluded because they showed sig- 
nificant differences in relative allelic expression between re- 
ciprocal hybrids using Fisher's exact tests with a null hypothesis of 
equal expression and an FDR of 0.05. Such differences in relative 
allelic expression can result from parent-of-origin effects such as 
mitochondrial inheritance or genomic imprinting; imprinting 
seems rarely, if ever, responsible for this pattern of expression in 
Drosophila, however (Wittkopp et al. 2006, 2008; Coolon et al. 
2012). After applying these filters, 4851 genes were deemed suit- 
able for allele-specific analysis in all comparisons, with most of the 
genes excluded from this data set because they had too few allele- 
specific reads in the mel-mel comparison (Supplemental Table S3). 

Mitochondrial genes were excluded from our allele-specific 
data set; however, allele assignments for F x hybrid reads that 
mapped to mitochondrial genes were used as one metric to eval- 
uate the reliability of our bioinformatic allele assignments. In the 
absence of sequencing and allele-assignment errors, all of these 
reads should map to the maternal allele. We found that 99.5% and 
99.8% of reads from mitochondrial genes mapped to the maternal 
allele in F x hybrids between D. simulans and D. sechellia and be- 
tween D. melanogaster and D. simulans, respectively (Supplemental 
Table S4). Additional validation of allele assignments is described 
in the main text. 

The exact data analyzed are provided in the Supplemental 
Material as Data set 2. 
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Evaluating cis- and fra/75-reguIatory changes 

Spearman's p was used to measure ds-regulatory divergence on 
a genomic scale in the mel-mel, sim-sech, and mel-sim compari- 
sons by assessing the correlation between allele 1 and allele 2 read 
counts from F x hybrids. It was also used to repeat the analysis of 
overall expression divergence in each comparison using the mixed 
parental samples. To test for statistically significant differences in p 
between mel-mel and sim-sech and between sim-sech and mel- 
sim, we used bootstrapping. Specifically we sampled with re- 
placement 4851 gene-specific read counts from the observed 4851 
genes 10,000 times using R, calculated p in each case, and de- 
termined the 2.5% and 97.5% percentiles. Significant differences 
were inferred when these 95% quantiles did not overlap. 

Binomial exact tests with a null hypothesis of equal expres- 
sion were used to identify significant expression differences be- 
tween genotypes in the mixed parental pools as well as significant 
differences in relative allelic expression in the F x hybrid samples 
that indicate differences in relative ds-regulatory activity. An FDR 
of 5% was used to determine statistical significance despite the fact 
that the P-values produced by binomial exact tests when the null 
hypothesis is true are not uniformly distributed as assumed by the 
FDR correction for multiple tests (Skelly et al. 201 1). This is because 
our simulations showed that the violation of this assumption had 
no effect on the number of genes called significant in this study 
(Supplemental Material). To test for the unequal allelic abundance 
between mixed parental and F x hybrid samples that would indicate 
trans-regulatory divergence, we performed Fisher's exact tests with 
a null hypothesis of equal expression by comparing read counts 
from genotype 1 and genotype 2 in the mixed parental sample to 
allele 1 and allele 2 in the corresponding F x hybrid samples. Each 
gene in each comparison was classified as "conserved," "all cis," 
"all trans/' "cis + trans/' "cis X trans/' "compensatory/' or "am- 
biguous" based on the results of the Fisher's and binomial exact 
tests using the criteria described in Supplemental Table S5. These 
same classifications were used previously in Landry et al. (2005) 
and (McManus et al. 2010). Fisher's exact tests were also used to test 
for significant differences in the proportion of genes with signifi- 
cant differences between mel-mel and sim-sech and between sim- 
sech and mel-sim. 

For each gene in each comparison, the total expression dif- 
ference was calculated as log 2 (genotype 1 read count/genotype 2 
read count) from the mixed parental sample, and the ds-regulatory 
difference ("cis") was calculated as log 2 (allele 1 read count/allele 2 
read count) from each of the V 1 hybrid samples. The trans- 
regulatory difference ("trans") for each gene in each comparison 
was calculated as the difference between the total expression 
and ds-regulatory differences: log 2 (genotype 1 read count/ 
genotype 2 read count) - log 2 (allele 1 read count/allele 2 read 
count). % cis (proportion of total regulatory divergence attribut- 
able to ds-regulatory changes) was then calculated as [|cis|/(|cis| + 
|trans|)] X 100. 



Scripts and software used 

All statistical analyses, down-sampling, and simulations were 
performed in R (version 2.12.2 or version 3.0.1, CRAN) (R De- 
velopment Core Team 2008). This code includes the use of 
fisher.test for Fisher's exact tests, binom.test for binomial exact 
tests, corr.test for Spearman's p, sample to randomly down-sample 
mapped reads and simulate mapped read counts from a multivari- 
ate distribution, rhyper to randomly down-sample allele-specific 
read counts, rbinom to simulate allele-specific read counts. Cus- 
tom perl and R scripts used in this work are included in Supple- 
mental File 2. 



Data access 

The sequencing data from this study have been submitted to the 
NCBI Sequence Read Archive (SRA; http://www.ncbi.nlm.nih.gov/ 
sra) under accession numbers SRA052065 and SRP023274. 
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